home *** CD-ROM | disk | FTP | other *** search
- This document deals with the inner workings. If you're a tech type, and are
- interested in somewhat deeper lying aspects relating to how the Imploder
- operates, chances are you'll find it here.
- Subjects covered are:
-
- - Compression
- - Decompression
- - Reversibility
- - The Library
- - Overlayed Files
- - Merging
- - ARP
- - The Music
- - The Copperlist
- - 68040 Cache Coherency
-
-
- ** Compression **
-
- The Imploder (we only recently learned :-) does LZ77 like compression with a
- per-mode static Huffman coding step on the various parts of the skip, offset
- and length tuples. Due to the efficient encoding, a tuple can require less
- than 12 bits, and thus strings of 2 bytes length are encodable with a decent
- gain (given small Huffman patterns corresponding to likely circumstances).
-
- To speed up the string searching, the turbo mode causes the accessible strings
- to be indexed using a hashing table. However, the fact that strings with a
- minimum size of two bytes are still potential candidates for compression,
- requires the hashing function to necessarily be rather simplistic. When the
- implosion algorithm processes highly redundant data, entries in the hashing
- table tends to get very imbalanced, causing a reduction in performance.
- For most types of data, notably executables, this isn't the case though.
-
-
- ** Decompression **
-
- The goal of decompression is to reproduce the segment list of the
- original program. This is a linked list of data/code/bss hunks, some
- of which require being positioned in chip memory.
-
- The decompression code will have to fill the data and code hunks with
- the decompressed data, and, if required, perform relocation of absolute
- references.
-
- The Imploder lets LoadSeg allocate all target hunks (as BSS). These
- are at the start of the hunk list. This is followed by a hunk with
- the decompression code (only for non-library imploded programs),
- a compressed data hunk (normally about 50% of the static data size,
- depending on compression gain ofcourse), and a decompression buffer
- (only upto 17K in size).
- As you can see, no allocations need to be done, so the exploding
- process will never fail.
-
- During decompression time, data is decompressed from the compressed data
- buffer into the decompression buffer until it has filled. It then empties
- this buffer by filling hunks and processing reloc tables.
- When the buffer has been emptied, the process repeats until all data has
- been scatter-decompressed across the target hunks.
-
- Now you might ask, why not directly decompress to the target hunks
- instead of using a decompression buffer?
- This is because the the Imploder uses the decompressed data to
- decompress, and needs to be able to easily access it (for speed).
- Referencing data distributed across several hunks is much more
- cumbersome.
-
- An added bonus of having separate source and target memory regions is that
- a notation can be used that doesn't gain under rare circumstances, but on
- average yields better results (No chance of source/target pointer
- collision).
-
- When explosion has finished, the decompression buffer, code and data
- hunks are freed, and memory usage reduced to what it would have been
- for a non-imploded version of the program.
-
- People often compare the Imploder to the Power Packer. The Imploder
- decompresses faster and looks cooler [:-)], but the most interesting
- differences lie in the implementation of the various steps of the
- decompression proccess. So let's contrast the advantages of the
- Imploder's approach with the Power-Packer's implementation.
-
- - By having LoadSeg allocate all memory as BBS hunks, explosion will
- never fail.
- The Power-Packer on the other hand allocates hunks. If it fails it
- will simply exit. Power Packed programs launched from the WorkBench
- thus won't reply the startup message, which will be left dangling in
- memory forever.
-
- - Memory doesn't get fragmented. The explosion related hunks are at
- the end of the seglist and thus were allocated (by LoadSeg) AFTER
- the target hunks.
- This isn't true for the Power-Packer. It does leave a hole in your
- free memory list when it frees its decompression stuff.
-
- - Additional memory usage is only about 50% of the static data size +
- the size of the decompression buffer, which is always small relative
- to large programs (maximum 17k).
- So a 30K program might require 62K to decompress (30+15+17), a 300K
- program will require 467K (300+150+17), assuming a 50% compression
- reduction.
- The memory usage report generated after a program has been imploded
- includes BSS hunks. I've discussed only static data here. BSS hunks
- don't require any extra memory usage of course.
- Power-Packed files require a buffer as large as the original program
- for both compressed data storage and decompression. Memory usage is
- therefore always about twice the static data size (again ignoring BSS)
- while for the Imploder it drops to 1.5 for executables large enough to
- matter memory wise.
-
- (In this comparison I'm talking about executables as produced
- by Power-Packer version 3.0b.)
-
- Non-library imploded programs have a small first hunk that calls the
- decompression code hunk at the end, and frees these last three hunks.
- For library imploded programs this freeing occurs in the library, so no
- preceding hunk is needed.
-
-
- ** Reversibility **
-
- Before compression the Imploder preprocesses executables. It kicks out all
- the redundant stuff by merging subreloc tables referring to the same hunk,
- sorting relocs (improves compression), removing debug symbols etc. etc.
- This is what all those info blurbs in the text window are about.
-
- So the deploded executable isn't guaranteed to be byte by byte identical
- as far as loadfile control codes are concerned.
-
- What is guaranteed is that the memory images created when the original,
- imploded and deploded program versions are loaded are identical.
-
- So the deplosion process isn't 100% reversible. Normally this is no
- problem. The reason for uncrunching is mostly wanting to recompress
- an executable using a different compression mode, or having a quick
- peek at the code e.g. when applying a patch with something like
- NewZap.
- If however you expect to need the debug symbols, or (important)
- require the executable to be in the _exact_ original format in order
- to have things like lpatch updates to applied, you're out of luck
- if you've only got the compressed executable. So always keep the
- original archives/disks!
-
- This is yet another argument for retaining the original archives.
- The Imploder is an online space creator not a distribution
- archiver (See the "Philosophy" text).
-
-
- ** The Library **
-
- The library code has been unrolled a bit, and optimized here and there
- in order to achieve optimal performance. This makes it faster than the
- normal explosion algorithm.
-
- If you library implode a program there is NO way in which the program,
- after explosion, will be able to notice. If you make sure the library
- is resident, this is also true for any executable file loaded for any
- purpose by any program.
-
- For normal etc. imploded programs the startup/cleanup hunk mentioned
- at the end of the "decompression" section might be detected if a program
- goes through contortions involving finding the segment list via murky
- DOS structures instead of simply doing PC relative hunk start referencing
- which also works from the WorkBench.
- I haven't encountered any programs that do this. Still this is yet another
- reason to use the library; there is not even the slightest chance of it
- being incompatible with an executable.
-
- Note that the Loadseg vector is patched in an "intelligent" manner; it
- will install fine for pre 2.0 kickstarts (braindead jumptable format)
- as well as in BCPL free systems (2.0+)
-
- Under pre 2.0, when a library imploded file is run from the WorkBench, and
- the explode.library isn't resident yet, Exec will try to load the library
- from disk. The process's message port however is in use by the WorkBench
- reply message, and until it has been replied, it cannot be used by the
- DOS in order to send packets. Thus the DOS gurus.
-
- Also, BCPL code doesn't jump through the the library vector. The only
- structural problem with this are handlers. These are loaded by the DOS,
- and the DOS is BCPL code, again ONLY under < 2.0. Under 2.0 the library
- works just like intended when it was first conceived. Transparently that
- is.
-
-
- ** Overlayed Files **
-
- The Imploder compresses the load part of an overlayed file as if it were a
- normal executable file. Subsequently, the overlay table and the overlayed
- program section are appended.
- It then tries to adapt the overlay table. Because different types of
- overlay supervisors are in use, the format of the overlay table isn't
- known to the Imploder. The only assumption made is that the overlay table
- contains seek-offset longwords, at longword aligned locations, that point
- into the file to the hunk_header ($3F3) identifiers of the overlayed
- program sections.
- This is how the standard overlay manager operates, but nothing prevents
- a programmer with sufficient technical knowledge to create a novel overlay
- format (e.g. selfextracting DImp files).
-
- If the Imploder finds one of these offsets, it is adjusted by the amount
- the initial load part of the executable file has compressed.
- The deplosion algorithm also tries to find these offsets when restoring the
- overlay table. Thus there is always a very small chance that the imploder
- will adapt a longword that was never meant to be an offset.
-
- An overlayed file gets its information from the loader in four longwords,
- at the start of the first hunk. In an imploded overlayed file, this hunk is
- the root hunk, and after decrunching these longwords are adjusted and moved
- into the first hunk of the actual program (the second hunk of the seglist).
-
- Evidently this process can never be 100% deterministic, so take heed and
- test any overlayed programs you've Imploded. Or don't use overlay implosion
- at all if you can spare the bits.
-
-
- ** Merging **
-
- Though modern linkers/compilers typically produce executables with one code
- hunk and one data hunk, there are still some old executables and less evolved
- linkers around. The merging option was implemented when executables with
- sufficient hunks to cause a lot of redundancy were still commonplace.
-
- Every hunk requires a longword in the allocation header, plus a hunk ID,
- load size, and hunk end ID. That's 16 bytes per hunk, and thus saved for
- every merge action. Doesn't sound like much, but generally hunks also
- have reloc tables. These waste a lot more space, especially with references
- to a lot of different hunks, though there's no easy equation.
-
- The merging step merges matching hunks (data-data, chipdata-chipdata,
- code-code) into hunks of upto the merge threshold in size. The actual
- size is of course determined by the sum of the sizes of the composite hunks,
- and may very well be a bit less than the specified threshold.
-
- Obviously this process discards redundant data in an irreversible fashion,
- so deploding the executable won't reverse it.
-
- Lots of tiny hunks cause memory fragmentation, but increase the chance of
- the program being able to load when the system is fragmented, and low on
- memory. Thus there is a kind of optimal balance that varies from system
- to system. In general it can be said that hunks less than 10K or more than
- 100K are "bad".
-
- Another factor is that loading a program with many tiny hunks causes the
- LoadSeg function to issue double as many tiny read commands, thus bogging
- down the speed with which an executable can be loaded into memory.
-
- For simplicity's sake, I've chosen for the Imploder to process executables
- within a single buffer, without the need for additional backup buffers.
- Thus, removing redundant information, and copying hunk data during the
- merging and reloc cleanup process involves moving or mirroring large parts
- of the buffer. This is why merging can take a while when processing a large
- executable with a hundred or so hunks.
-
-
- ** ARP **
-
- Programs written for use with the ARP shell are able to specify the
- stacksize they require inside the first hunk of their executable. If
- such a program is normal or pure imploded, the segment list won't become
- visible until the program is run. Thus ARP has no way of finding out what
- the proper stacksize should be.
- Library imploded programs have no trouble with this because they are
- already decrunched after they are loaded into memory with LoadSeg.
- (Provided the library has already been made resident.)
-
- The Imploder will recognize these files and report on them. If the
- requested stack-size is larger than the usual minimum (4000 bytes)
- a warning will be printed, and you'll be urged to use only library
- implosion.
-
- The chance of a programmer relying on a soon to be obsolete shell for
- setting a stack LARGER than the usual default is rather slim though.
- It would have been very nice if 2.0 had sported such a stack setting
- feature, and indeed it had been planned, but was never implemented due
- to lack of time on the part of the Commodore programmers.
-
- We'll be on the lookout for any future changes to the executable file
- format in order to fix any potential incompatibilities before they'll
- cause problems.
-
-
- ** The Music **
-
- When we got word the CIA-A timer was used by the OS under 2.0, we switched to
- trying to allocate first CIA-A and if not available CIA-B to drive the music.
- However the CIA-B interrupt priority is too high and can interfere with things
- like serial transfer. So Paul got this great idea to keep on using a CIA for
- precision timing purposes, but drop down to a SoftInt for doing the actual
- work, modulations, etc. This works great, the amount of code executed under
- CIA priority is now negligible.
- Recently, the CATS started feeling guilty about hijacking the CIA-A timer and
- thus created "Jumpy the magic timer device". If I understood things correctly
- the latest 2.0 timer device moves out of the way and starts using a less
- accurate timing source whenever an application tries to allocate the CIA-A.
- Pauls music driver can run of both CIA-A and CIA-B, and it would be a pity
- to make Jumpy jump without good reason, so he changed the alloction sequence
- from A-B to B-A.
-
-
- ** The Copperlist **
-
- There are a couple of unavoidable quirks when one uses copperlists on Intuition
- screens. On certain machines, probably PAL, under certain circumstances, dragging
- a dense copperlist past scanline 256 or so will cause some video crud to appear
- at the top of the display. This can't hurt, but it sure does look ugly. I suspect
- this is a hardware misfeature because it ain't fixed yet under 2.0
- This was the reason why the screen of older Imploder versions wasn't draggable;
- you might just think this muck was our doing.
-
- Second problem is that copperlists "shine through" onto other screens in front.
- For this reason we've choosen a colour > 4 for the level bars, so this is never
- observable with screens less than 3 bitplanes deep.
- The 2.0 OS support proper copperlist clipping, but it has been disabled by
- default for compatibility reasons (yeach). Supposedly there is a bit somewhere
- in the graphics base to turn this back on, so I'm sure, in due time, there will
- be some preference editor to re-enable this.
-
-
- ** 68040 Cache Coherency **
-
- With the advent of the 68040 processor, programs that diddle with code which is
- subsequently executed will be prone to some problems. I don't mean the usual
- self-modifying code causing the code cached in the data cache to no longer
- be as the algorithm expects. This is something the Imploder never had a
- problem with, indeed the Imploder has always worked fine with anything
- upto and including an 68030.
-
- The reason the 68040 is different is that it has a "copyback" mode. In this
- mode (which WILL be used by people because it increases speed dramatically)
- writes get cached and aren't guaranteed to be written out to main memory
- immediately. Thus 4 subsequent byte writes will require only one longword
- main memory write access. Now you might have heard that the 68040 does
- bus-snooping. The odd thing is that it doesn't snoop the internal cache
- buses!
-
- Thus if you stuff some code into memory and try to execute it, chances are
- some of it will still be in the data cache. The code cache won't know about
- this and won't be notified when it caches from main memory those locations
- which do not yet contain code still to be written out from the data caches.
- This problem is amplified by the absolutely huge size of the caches.
-
- So programs that move code, like the explosion algorithms, need to do a
- cache flush after being done. As of version 4.0, the appended decompression
- algorithms as well as the explode.library flush the cache, but only onder OS
- 2.0. The reason for this is that only OS 2.0 has calls for cache-flushing.
-
- This is yet another reason not to distribute imploded programs; they might
- just cross the path of a proud '40 owner still running under 1.3.
-
- It will be interesting to see how many other applications will run into
- trouble once the '40 comes into common use among Amiga owners. The problem
- explained above is something that could not have been easily anticipated
- by developers. It is known that the startup code shipped with certain
- compilers does copy bits of code, so it might very well be a large problem.
-